Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RISCV][llvm] Handle ptr element type in lowerDeinterleaveIntrinsicToLoad and lowerInterleaveIntrinsicToStore #107079

Merged
merged 3 commits into from
Sep 5, 2024

Conversation

4vtomat
Copy link
Member

@4vtomat 4vtomat commented Sep 3, 2024

Resolve #106970

currently it returns 0 fixed size for ptr element type. The ptr
element size should depend on XLen which is 64 in riscv64 and 32 in
riscv32 respectively.

…cToLoad` and `lowerInterleaveIntrinsicToStore`

currently it returns 0 fixed size for `ptr` element type. The `ptr`
element size should depend on `XLen` which is 64 in riscv64 and 32 in
riscv32 respectively.
@llvmbot
Copy link
Collaborator

llvmbot commented Sep 3, 2024

@llvm/pr-subscribers-backend-risc-v

Author: Brandon Wu (4vtomat)

Changes

Resolve #106970

currently it returns 0 fixed size for ptr element type. The ptr
element size should depend on XLen which is 64 in riscv64 and 32 in
riscv32 respectively.


Full diff: https://github.com/llvm/llvm-project/pull/107079.diff

3 Files Affected:

  • (modified) llvm/lib/Target/RISCV/RISCVISelLowering.cpp (+6-2)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vector-deinterleave-load.ll (+20-2)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vector-interleave-store.ll (+19-2)
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index f50d378ed97aa6..b9c096af6e9b65 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -21951,7 +21951,9 @@ bool RISCVTargetLowering::lowerDeinterleaveIntrinsicToLoad(
         Intrinsic::riscv_vlseg6, Intrinsic::riscv_vlseg7,
         Intrinsic::riscv_vlseg8};
 
-    unsigned SEW = ResVTy->getElementType()->getScalarSizeInBits();
+    unsigned SEW = ResVTy->getElementType()->isPointerTy()
+                       ? Subtarget.getXLen()
+                       : ResVTy->getElementType()->getScalarSizeInBits();
     unsigned NumElts = ResVTy->getElementCount().getKnownMinValue();
     Type *VecTupTy = TargetExtType::get(
         LI->getContext(), "riscv.vector.tuple",
@@ -22021,7 +22023,9 @@ bool RISCVTargetLowering::lowerInterleaveIntrinsicToStore(
         Intrinsic::riscv_vsseg6, Intrinsic::riscv_vsseg7,
         Intrinsic::riscv_vsseg8};
 
-    unsigned SEW = InVTy->getElementType()->getScalarSizeInBits();
+    unsigned SEW = InVTy->getElementType()->isPointerTy()
+                       ? Subtarget.getXLen()
+                       : InVTy->getElementType()->getScalarSizeInBits();
     unsigned NumElts = InVTy->getElementCount().getKnownMinValue();
     Type *VecTupTy = TargetExtType::get(
         SI->getContext(), "riscv.vector.tuple",
diff --git a/llvm/test/CodeGen/RISCV/rvv/vector-deinterleave-load.ll b/llvm/test/CodeGen/RISCV/rvv/vector-deinterleave-load.ll
index e2f956ca03ff8e..54373d94f8f5f3 100644
--- a/llvm/test/CodeGen/RISCV/rvv/vector-deinterleave-load.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/vector-deinterleave-load.ll
@@ -1,6 +1,6 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
-; RUN: llc < %s -mtriple=riscv32 -mattr=+v,+zfh,+zvfh,+m | FileCheck %s
-; RUN: llc < %s -mtriple=riscv64 -mattr=+v,+zfh,+zvfh,+m | FileCheck %s
+; RUN: llc < %s -mtriple=riscv32 -mattr=+v,+zfh,+zvfh,+m | FileCheck --check-prefixes=CHECK,RV32 %s
+; RUN: llc < %s -mtriple=riscv64 -mattr=+v,+zfh,+zvfh,+m | FileCheck --check-prefixes=CHECK,RV64 %s
 
 ; Integers
 
@@ -263,9 +263,27 @@ define {<vscale x 2 x double>, <vscale x 2 x double>} @vector_deinterleave_load_
   ret {<vscale x 2 x double>, <vscale x 2 x double>} %retval
 }
 
+define {<vscale x 2 x ptr>, <vscale x 2 x ptr>} @vector_deinterleave_load_nxv2p0_nxv4p0(ptr %p) {
+; RV32-LABEL: vector_deinterleave_load_nxv2p0_nxv4p0:
+; RV32:       # %bb.0:
+; RV32-NEXT:    vsetvli a1, zero, e32, m1, ta, ma
+; RV32-NEXT:    vlseg2e32.v v8, (a0)
+; RV32-NEXT:    ret
+;
+; RV64-LABEL: vector_deinterleave_load_nxv2p0_nxv4p0:
+; RV64:       # %bb.0:
+; RV64-NEXT:    vsetvli a1, zero, e64, m2, ta, ma
+; RV64-NEXT:    vlseg2e64.v v8, (a0)
+; RV64-NEXT:    ret
+  %vec = load <vscale x 4 x ptr>, ptr %p
+  %retval = call {<vscale x 2 x ptr>, <vscale x 2 x ptr>} @llvm.vector.deinterleave2.nxv4p0(<vscale x 4 x ptr> %vec)
+  ret {<vscale x 2 x ptr>, <vscale x 2 x ptr>} %retval
+}
+
 declare {<vscale x 2 x half>,<vscale x 2 x half>} @llvm.vector.deinterleave2.nxv4f16(<vscale x 4 x half>)
 declare {<vscale x 4 x half>, <vscale x 4 x half>} @llvm.vector.deinterleave2.nxv8f16(<vscale x 8 x half>)
 declare {<vscale x 2 x float>, <vscale x 2 x float>} @llvm.vector.deinterleave2.nxv4f32(<vscale x 4 x float>)
 declare {<vscale x 8 x half>, <vscale x 8 x half>} @llvm.vector.deinterleave2.nxv16f16(<vscale x 16 x half>)
 declare {<vscale x 4 x float>, <vscale x 4 x float>} @llvm.vector.deinterleave2.nxv8f32(<vscale x 8 x float>)
 declare {<vscale x 2 x double>, <vscale x 2 x double>} @llvm.vector.deinterleave2.nxv4f64(<vscale x 4 x double>)
+declare {<vscale x 2 x ptr>, <vscale x 2 x ptr>} @llvm.vector.deinterleave2.nxv4p0(<vscale x 4 x ptr>)
diff --git a/llvm/test/CodeGen/RISCV/rvv/vector-interleave-store.ll b/llvm/test/CodeGen/RISCV/rvv/vector-interleave-store.ll
index 5ebf63f0a4411e..a06aa2d02b11b5 100644
--- a/llvm/test/CodeGen/RISCV/rvv/vector-interleave-store.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/vector-interleave-store.ll
@@ -1,6 +1,6 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
-; RUN: llc < %s -mtriple=riscv32 -mattr=+v,+zfh,+zvfh | FileCheck %s
-; RUN: llc < %s -mtriple=riscv64 -mattr=+v,+zfh,+zvfh | FileCheck %s
+; RUN: llc < %s -mtriple=riscv32 -mattr=+v,+zfh,+zvfh | FileCheck --check-prefixes=CHECK,RV32 %s
+; RUN: llc < %s -mtriple=riscv64 -mattr=+v,+zfh,+zvfh | FileCheck --check-prefixes=CHECK,RV64 %s
 
 ; Integers
 
@@ -218,6 +218,22 @@ define void @vector_interleave_store_nxv4f64_nxv2f64(<vscale x 2 x double> %a, <
   ret void
 }
 
+define void @vector_interleave_store_nxv4p0_nxv2p0(<vscale x 2 x ptr> %a, <vscale x 2 x ptr> %b, ptr %p) {
+; RV32-LABEL: vector_interleave_store_nxv4p0_nxv2p0:
+; RV32:       # %bb.0:
+; RV32-NEXT:    vsetvli a1, zero, e32, m1, ta, ma
+; RV32-NEXT:    vsseg2e32.v v8, (a0)
+; RV32-NEXT:    ret
+;
+; RV64-LABEL: vector_interleave_store_nxv4p0_nxv2p0:
+; RV64:       # %bb.0:
+; RV64-NEXT:    vsetvli a1, zero, e64, m2, ta, ma
+; RV64-NEXT:    vsseg2e64.v v8, (a0)
+; RV64-NEXT:    ret
+  %res = call <vscale x 4 x ptr> @llvm.vector.interleave2.nxv4p0(<vscale x 2 x ptr> %a, <vscale x 2 x ptr> %b)
+  store <vscale x 4 x ptr> %res, ptr %p
+  ret void
+}
 
 declare <vscale x 4 x half> @llvm.vector.interleave2.nxv4f16(<vscale x 2 x half>, <vscale x 2 x half>)
 declare <vscale x 8 x half> @llvm.vector.interleave2.nxv8f16(<vscale x 4 x half>, <vscale x 4 x half>)
@@ -225,3 +241,4 @@ declare <vscale x 4 x float> @llvm.vector.interleave2.nxv4f32(<vscale x 2 x floa
 declare <vscale x 16 x half> @llvm.vector.interleave2.nxv16f16(<vscale x 8 x half>, <vscale x 8 x half>)
 declare <vscale x 8 x float> @llvm.vector.interleave2.nxv8f32(<vscale x 4 x float>, <vscale x 4 x float>)
 declare <vscale x 4 x double> @llvm.vector.interleave2.nxv4f64(<vscale x 2 x double>, <vscale x 2 x double>)
+declare <vscale x 4 x ptr> @llvm.vector.interleave2.nxv4p0(<vscale x 2 x ptr>, <vscale x 2 x ptr>)

…ntrinsicToLoad` and `lowerInterleaveIntrinsicToStore`
Copy link
Contributor

@lukel97 lukel97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for fixing this!

…ntrinsicToLoad` and `lowerInterleaveIntrinsicToStore`
Copy link
Collaborator

@topperc topperc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@4vtomat 4vtomat merged commit 1465e23 into llvm:main Sep 5, 2024
6 of 8 checks passed
@4vtomat 4vtomat deleted the issue_106970 branch September 5, 2024 04:46
@mshabunin
Copy link

Apparently this change also fixes a problem with illegal instructions being generated for some RISC-V RVV code which we have observed with llvm 18 and 19. I'm not sure if it is a coincidence or both issues have the same root cause. I've found this commit and PR by bisecting llvm between start of 20th dev cycle and some commit at the main (10c6d63 - 4822e9d).

Our issue was with vfncvt instruction being added in the generated code with register overlap (source is a register group of 2 = v8-v9 and destination is a single register v9) which is illegal instruction:

vsetvli	a0, zero, e32, m1, ta, ma
vfncvt.f.f.w	v9, v8

I couldn't create small reproducer, so I'm attaching a larger one for completeness: repro-19.1.0.zip

@topperc
Copy link
Collaborator

topperc commented Sep 30, 2024

Apparently this change also fixes a problem with illegal instructions being generated for some RISC-V RVV code which we have observed with llvm 18 and 19. I'm not sure if it is a coincidence or both issues have the same root cause. I've found this commit and PR by bisecting llvm between start of 20th dev cycle and some commit at the main (10c6d63 - 4822e9d).

Our issue was with vfncvt instruction being added in the generated code with register overlap (source is a register group of 2 = v8-v9 and destination is a single register v9) which is illegal instruction:

vsetvli	a0, zero, e32, m1, ta, ma
vfncvt.f.f.w	v9, v8

I couldn't create small reproducer, so I'm attaching a larger one for completeness: repro-19.1.0.zip

I checked out 19.1 and built it. I don't see vfncvt.f.f.w v9, v8. The only vfncvt.f.f.w I see is vfncvt.f.f.w v10, v8.

@mshabunin
Copy link

Sorry for misleading comment, with that reproducer incorrect instruction is vfncvt.x.f.w v11,v10 with m1 group:

/work/chains/llvm-19.1.0/bin/llvm-objdump -d test_intrin-72f8b0.o | grep -B 2 vfncvt
<cut>
     50e: 0aab5557     	vfsub.vf	v10, v10, fs6
     512: 0d007557     	vsetvli	a0, zero, e32, m1, ta, ma
     516: 4aa895d7     	vfncvt.x.f.w	v11, v10
<cut>

@topperc
Copy link
Collaborator

topperc commented Sep 30, 2024

Sorry for misleading comment, with that reproducer incorrect instruction is vfncvt.x.f.w v11,v10 with m1 group:

/work/chains/llvm-19.1.0/bin/llvm-objdump -d test_intrin-72f8b0.o | grep -B 2 vfncvt
<cut>
     50e: 0aab5557     	vfsub.vf	v10, v10, fs6
     512: 0d007557     	vsetvli	a0, zero, e32, m1, ta, ma
     516: 4aa895d7     	vfncvt.x.f.w	v11, v10
<cut>

It's possible this was fixed by bf8101e

@mshabunin
Copy link

It's possible this was fixed by bf8101e

Yes, it seems so. I don't observe this issue with our tests with the patch #110603.
Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[RISCV] Assertion failed: (MinNumElts > 0 && "#Elements of a VectorType must be greater than 0")
6 participants